perf(stark): fuse deep-composition reconstruction for both FRI points#775
Open
Oppen wants to merge 3 commits into
Open
perf(stark): fuse deep-composition reconstruction for both FRI points#775Oppen wants to merge 3 commits into
Oppen wants to merge 3 commits into
Conversation
Reconstructing the deep composition polynomial per query walked the OOD table, the trace-term coefficients, and inverted the trace/comp denominators independently for the regular and symmetric evaluation points, even though the OOD-derived terms don't depend on the point. Rewriting coeff*(base-ood)*denom as denom*(coeff*base - coeff*ood) isolates the point-independent coeff*ood term (computed once, shared between both points) from coeff*base (cheap base*ext multiply for base-field columns), and lets both denom sets batch-invert together. Multi-query recursion profile: 2,210,366,539 -> 1,951,764,531 cycles (-11.7%); step 3 (FRI) 912M -> 652M cycles (-28.5%).
The doc comment on reconstruct_deep_composition_poly_evaluation_pair restated the inline comment three lines into the function body.
reconstruct_deep_composition_poly_evaluation_pair walked the z*g^k
ladder twice per query (once per evaluation point) though it doesn't
depend on which point is being evaluated; interleave the two
denominator pushes into one walk instead. The 2-element composition-
tail batch inverse also went through the general Vec-allocating
inplace_batch_inverse; hand-roll it (one inversion, three muls, no
allocation).
Also mark the IsSubFieldOf<Degree{2,3}GoldilocksExtensionField> mul/
add/sub impls for GoldilocksField #[inline(always)], matching the
sibling IsField impls on the same types — these are exactly the cheap
asymmetric ops the deep-composition reconstruction fusion relies on.
Multi-query recursion profile (two make-rebuilt runs per side, since
the underlying proof has run-to-run variance): baseline 1,947,221,680
/ 1,947,263,739 cycles; with this change 1,939,869,854 / 1,939,845,966
(-~7.37M, -0.38%). No correctness change (in-VM verify output digest
identical to the prior commit).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
reconstruct_deep_composition_poly_evaluation(STARK verifier, FRI step)into one traversal, sharing the OOD-table walk, the
trace_term_coeffswalk, and both
inplace_batch_inversecalls per query. The core trick:coeff*(base-ood)*denomdistributes to
denom*(coeff*base - coeff*ood), isolating thepoint-independent
coeff*oodterm (computed once, shared between bothpoints) and letting base-field columns use the cheap
IsSubFieldOfasymmetric multiply instead of a full extension-field product.
z*g^kladder walk within a query (was computedtwice, once per evaluation point), hand-rolls the 2-element
composition-tail batch inverse to avoid a Vec allocation, and marks the
IsSubFieldOf<Degree{2,3}GoldilocksExtensionField>mul/add/sub impls#[inline(always)]to match their siblingIsFieldimpls.Measured impact
Multi-query recursion profile (
make test-profile-recursion-multi; theunderlying proof has ~40-45K cycles of run-to-run variance from
stark::grinding::generate_nonce's parallelfind_anysearch, so alldeltas below are 2-sample-cluster-to-cluster, not point-to-point):
85fa5b3e): ~1,947,242,710 cycles (-11.9%)3ac5e494): ~1,939,857,910 cycles(a further -0.38%, -14.2% total from baseline)
Single-query and step-3 (FRI) improvements are proportionally similar; see
individual commit messages for per-commit numbers.
Soundness
Every guard the two-call version had (base/aux-length checks, the
trace_term_coeffssanity check, batch-inverse zero-rejection, thecomposition-poly-parts-count check) is reproduced exactly or made strictly
stricter in the fused version — verified by independent adversarial review
agents against the reject-path semantics, the algebraic identity, and the
soundness threat model (a dishonest prover cannot craft a proof this code
accepts that the prior code would have rejected).
Test plan
make test(workspace-wide; only failure ismath-cuda, which needsa GPU unavailable in this environment — pre-existing, unrelated)
make test-ethrexcargo test -p lambda-vm-prover --lib test_recursion_execute_1query -- --ignored --nocapture(in-VM end-to-end verify, output digestunchanged across all three commits)
then a second round covering performance/simplicity) on the diff